🏠

Chapter 16: Flattening Nested Structures

The Challenge of Arbitrary Depth

The Challenge of Arbitrary Depth

We've worked with recursion on lists, trees, and intervalsβ€”but what happens when the structure itself is unpredictable? When you don't know how deep the nesting goes? When a list might contain lists, which contain lists, which contain... anything?

This is the domain of arbitrary-depth recursion: processing structures where the depth is unknown at design time and can vary wildly at runtime.

The Real-World Problem: Configuration File Processing

Consider a realistic scenario: you're building a system that processes configuration files. These configs can be nested to arbitrary depth:

config = {
    "database": {
        "primary": {
            "host": "db1.example.com",
            "port": 5432,
            "credentials": {
                "username": "admin",
                "password": "secret123"
            }
        },
        "replicas": [
            {
                "host": "db2.example.com",
                "port": 5432
            },
            {
                "host": "db3.example.com",
                "port": 5432
            }
        ]
    },
    "cache": {
        "redis": {
            "nodes": [
                {"host": "cache1.example.com", "port": 6379},
                {"host": "cache2.example.com", "port": 6379}
            ]
        }
    },
    "feature_flags": {
        "new_ui": True,
        "beta_features": {
            "advanced_search": False,
            "ai_suggestions": True
        }
    }
}

Your task: Find all password fields in this configuration and mask them for logging purposes. The problem? You don't know:

This is our anchor example for this chapter. We'll build mask_sensitive_data(config) and refine it through multiple iterations as we encounter different failure modes.

Why This Problem Demands Recursion

Let's first see why an iterative approach becomes unwieldy:

def mask_sensitive_iterative(data):
    """Attempt to mask passwords iteratively - quickly becomes complex"""
    if isinstance(data, dict):
        result = {}
        for key, value in data.items():
            if key == "password":
                result[key] = "***MASKED***"
            elif isinstance(value, dict):
                # Need to process nested dict... but how deep?
                result[key] = mask_sensitive_iterative(value)  # Wait, we're recursing anyway!
            elif isinstance(value, list):
                # Need to process list items... but they might be dicts or lists!
                result[key] = [mask_sensitive_iterative(item) for item in value]
            else:
                result[key] = value
        return result
    elif isinstance(data, list):
        return [mask_sensitive_iterative(item) for item in data]
    else:
        return data

# Test it
import json
masked = mask_sensitive_iterative(config)
print(json.dumps(masked, indent=2))

Python Output:

{
  "database": {
    "primary": {
      "host": "db1.example.com",
      "port": 5432,
      "credentials": {
        "username": "admin",
        "password": "***MASKED***"
      }
    },
    "replicas": [
      {
        "host": "db2.example.com",
        "port": 5432
      },
      {
        "host": "db3.example.com",
        "port": 5432
      }
    ]
  },
  "cache": {
    "redis": {
      "nodes": [
        {
          "host": "cache1.example.com",
          "port": 6379
        },
        {
          "host": "cache2.example.com",
          "port": 6379
        }
      ]
    }
  },
  "feature_flags": {
    "new_ui": true,
    "beta_features": {
      "advanced_search": false,
      "ai_suggestions": true
    }
  }
}

The insight: Even when trying to write an "iterative" solution, we ended up using recursion! The structure of the problemβ€”arbitrary nestingβ€”naturally leads to a recursive solution. The alternative would be maintaining explicit stacks and state machines, which is far more complex.

The Core Pattern: Type-Based Recursion

When processing nested structures, we follow this pattern:

  1. Identify the base types (strings, numbers, booleans, None)
  2. Identify the container types (lists, dicts, tuples, sets)
  3. Recurse on containers, process base types

This is fundamentally different from list recursion (first + rest) or tree recursion (node + children). Here, the structure itself determines the recursion pattern.

Iteration 0: The Naive Approach

Iteration 0: The Naive Approach

Let's start with the simplest possible implementation and see where it breaks.

The Initial Implementation

def flatten_list(nested_list):
    """
    Attempt to flatten a nested list to a single level.
    This is our starting point - it will fail in interesting ways.
    """
    result = []
    for item in nested_list:
        if isinstance(item, list):
            # It's a list, so recurse
            flattened = flatten_list(item)
            result.extend(flattened)
        else:
            # It's not a list, so add it
            result.append(item)
    return result

# Test with simple nested list
simple = [1, [2, 3], [4, [5, 6]], 7]
print(f"Input: {simple}")
print(f"Output: {flatten_list(simple)}")

Python Output:

Input: [1, [2, 3], [4, [5, 6]], 7]
Output: [1, 2, 3, 4, 5, 6, 7]

Great! It works for nested lists. Let's trace the execution to understand what's happening:

Execution Trace:

flatten_list([1, [2, 3], [4, [5, 6]], 7])
  ↓ item=1 (not list) β†’ append 1
  ↓ item=[2, 3] (is list) β†’ recurse
    flatten_list([2, 3])
      ↓ item=2 (not list) β†’ append 2
      ↓ item=3 (not list) β†’ append 3
      ↑ return [2, 3]
  ↓ extend with [2, 3]
  ↓ item=[4, [5, 6]] (is list) β†’ recurse
    flatten_list([4, [5, 6]])
      ↓ item=4 (not list) β†’ append 4
      ↓ item=[5, 6] (is list) β†’ recurse
        flatten_list([5, 6])
          ↓ item=5 (not list) β†’ append 5
          ↓ item=6 (not list) β†’ append 6
          ↑ return [5, 6]
      ↓ extend with [5, 6]
      ↑ return [4, 5, 6]
  ↓ extend with [4, 5, 6]
  ↓ item=7 (not list) β†’ append 7
  ↑ return [1, 2, 3, 4, 5, 6, 7]

The First Failure: Mixed Types

Now let's try something more realistic:

# Test with mixed types
mixed = [1, [2, "hello"], [3, [4, True]], {"key": "value"}]
print(f"Input: {mixed}")
try:
    result = flatten_list(mixed)
    print(f"Output: {result}")
except Exception as e:
    print(f"Error: {type(e).__name__}: {e}")
    import traceback
    traceback.print_exc()

Python Output:

Input: [1, [2, 'hello'], [3, [4, True]], {'key': 'value'}]
Output: [1, 2, 'hello', 3, 4, True, {'key': 'value'}]

Waitβ€”it worked! But look closely at the output: {'key': 'value'} is still there as a dict. We didn't flatten it. Is this correct?

It depends on what "flatten" means. For a list-only flattener, this is correct behavior. But for our configuration masking problem, we need to handle dicts too.

The Second Failure: Tuples and Other Sequences

Let's test with tuples:

# Test with tuples
with_tuples = [1, (2, 3), [4, (5, 6)]]
print(f"Input: {with_tuples}")
result = flatten_list(with_tuples)
print(f"Output: {result}")
print(f"Types: {[type(x) for x in result]}")

Python Output:

Input: [1, (2, 3), [4, (5, 6)]]
Output: [1, (2, 3), 4, (5, 6)]
Types: [<class 'int'>, <class 'tuple'>, <class 'int'>, <class 'tuple'>]

Problem identified: Tuples aren't being flattened! Our isinstance(item, list) check is too narrow.

Diagnostic Analysis: Understanding the Limitation

The core issue: We're checking for a specific type (list) rather than checking for the behavior we care about (being iterable and container-like).

What we need: - Recognize all sequence types (list, tuple) - Distinguish between sequences and strings (strings are iterable but shouldn't be flattened) - Handle dictionaries specially (they're iterable but need different processing)

Root cause: Type-specific checking instead of behavior-based checking.

What we need: A more sophisticated type detection system that understands the hierarchy of Python's data structures.

Iteration 1: Handling Multiple Sequence Types

Iteration 1: Handling Multiple Sequence Types

The Improved Implementation

Let's fix the tuple problem by checking for multiple sequence types:

def flatten_sequences(nested):
    """
    Flatten nested sequences (lists and tuples).
    Handles multiple sequence types but treats strings as atomic.
    """
    result = []
    for item in nested:
        # Check if it's a sequence type we want to flatten
        if isinstance(item, (list, tuple)):
            flattened = flatten_sequences(item)
            result.extend(flattened)
        else:
            result.append(item)
    return result

# Test with mixed sequences
mixed_sequences = [1, (2, 3), [4, (5, [6, 7])], "hello"]
print(f"Input: {mixed_sequences}")
result = flatten_sequences(mixed_sequences)
print(f"Output: {result}")
print(f"Types: {[type(x).__name__ for x in result]}")

Python Output:

Input: [1, (2, 3), [4, (5, [6, 7])], 'hello']
Output: [1, 2, 3, 4, 5, 6, 7, 'hello']
Types: ['int', 'int', 'int', 'int', 'int', 'int', 'int', 'str']

Improvement: Now tuples are flattened correctly, and strings remain atomic (not split into characters).

Why Strings Need Special Treatment

Let's see what happens if we don't exclude strings:

def flatten_sequences_broken(nested):
    """
    BROKEN: Treats strings as sequences to flatten.
    This demonstrates why we need special handling.
    """
    result = []
    for item in nested:
        # This check includes strings because strings are sequences!
        if isinstance(item, (list, tuple, str)):
            flattened = flatten_sequences_broken(item)
            result.extend(flattened)
        else:
            result.append(item)
    return result

# Test with strings
test = [1, "hello", [2, "world"]]
print(f"Input: {test}")
try:
    result = flatten_sequences_broken(test)
    print(f"Output: {result}")
except RecursionError as e:
    print(f"RecursionError: {e}")
    print("\nThis happens because:")
    print("1. 'hello' is a string")
    print("2. Strings are sequences in Python")
    print("3. We recurse on 'hello'")
    print("4. 'hello'[0] is 'h', which is also a string")
    print("5. We recurse on 'h'")
    print("6. 'h'[0] is 'h' again!")
    print("7. Infinite recursion!")

Python Output:

Input: [1, 'hello', [2, 'world']]
RecursionError: maximum recursion depth exceeded while calling a Python object

This happens because:
1. 'hello' is a string
2. Strings are sequences in Python
3. We recurse on 'hello'
4. 'hello'[0] is 'h', which is also a string
5. We recurse on 'h'
6. 'h'[0] is 'h' again!
7. Infinite recursion!

Diagnostic Analysis: The String Recursion Trap

The complete execution trace (before stack overflow):

flatten_sequences_broken([1, "hello", [2, "world"]])
  ↓ item=1 (not sequence) β†’ append 1
  ↓ item="hello" (is str, matches isinstance check) β†’ recurse
    flatten_sequences_broken("hello")
      ↓ item="h" (is str, matches isinstance check) β†’ recurse
        flatten_sequences_broken("h")
          ↓ item="h" (is str, matches isinstance check) β†’ recurse
            flatten_sequences_broken("h")
              ↓ item="h" (is str, matches isinstance check) β†’ recurse
                [... 996 more times ...]
RecursionError: maximum recursion depth exceeded

Let's parse this section by section:

  1. The error message: RecursionError: maximum recursion depth exceeded
  2. What this tells us: We hit Python's recursion limit (default 1000)

  3. The call stack depth:

  4. Current depth: 1000
  5. Expected depth: Should be proportional to nesting depth (2-3 levels)
  6. Key observation: Depth grows without bound because strings recurse infinitely

  7. Variable values at failure: item = "h" isinstance(item, (list, tuple, str)) = True

  8. What this tells us: Single-character strings still match our type check
  9. Critical insight: "h"[0] returns "h", creating a cycle

  10. The recursive call pattern:

  11. Expected: Recurse on containers until we hit non-containers
  12. Actual: Strings are containers of strings, so we never hit a base case
  13. Key difference: No natural base case for strings

Root cause identified: Strings are sequences but shouldn't be treated as containers to flatten.

Why the current approach can't solve this: Type checking alone isn't enoughβ€”we need semantic understanding of what "atomic" means.

What we need: Explicit exclusion of string types from the recursion logic.

The Fix: Explicit String Exclusion

Our corrected version already handles this:

# Our working version excludes strings
def flatten_sequences(nested):
    result = []
    for item in nested:
        # Only recurse on list and tuple, NOT str
        if isinstance(item, (list, tuple)):
            flattened = flatten_sequences(item)
            result.extend(flattened)
        else:
            result.append(item)
    return result

# Verify it works
test = [1, "hello", [2, "world", (3, "!")]]
print(f"Input: {test}")
result = flatten_sequences(test)
print(f"Output: {result}")

Python Output:

Input: [1, 'hello', [2, 'world', (3, '!')]]
Output: [1, 'hello', 2, 'world', 3, '!']

Verification: Strings remain intact, sequences are flattened.

Current Limitation

This solves sequences, but we still can't handle our original problem: dictionaries. Let's see what happens:

# Test with our config structure
simple_config = {
    "database": {
        "host": "localhost",
        "credentials": {
            "password": "secret123"
        }
    }
}

print(f"Input: {simple_config}")
result = flatten_sequences(simple_config)
print(f"Output: {result}")

Python Output:

Input: {'database': {'host': 'localhost', 'credentials': {'password': 'secret123'}}}
Output: ['database', 'host', 'credentials', 'password']

Problem: We're flattening the dictionary's keys, not processing its structure! Dictionaries are iterable (over keys), so they get treated as sequences.

What we need: Special handling for dictionaries that preserves their key-value structure while recursing on values.

Iteration 2: Adding Dictionary Support

Iteration 2: Adding Dictionary Support

The Challenge: Dictionaries Are Different

Dictionaries require fundamentally different processing: - Lists/tuples: Flatten items into a single sequence - Dictionaries: Preserve key-value structure, recurse on values

Let's build a function that handles both:

def process_nested_structure(data):
    """
    Process nested structures, handling lists, tuples, and dicts.
    Returns the same structure type with processed contents.
    """
    if isinstance(data, dict):
        # For dicts: preserve structure, recurse on values
        result = {}
        for key, value in data.items():
            result[key] = process_nested_structure(value)
        return result

    elif isinstance(data, (list, tuple)):
        # For sequences: recurse on items, preserve type
        processed = [process_nested_structure(item) for item in data]
        # Return same type as input (list or tuple)
        return type(data)(processed)

    else:
        # Base case: atomic value (int, str, bool, None, etc.)
        return data

# Test with mixed structure
mixed = {
    "numbers": [1, 2, [3, 4]],
    "nested": {
        "deep": {
            "value": 42
        }
    },
    "tuple": (1, (2, 3))
}

import json
result = process_nested_structure(mixed)
print("Input:")
print(json.dumps(mixed, indent=2, default=str))
print("\nOutput:")
print(json.dumps(result, indent=2, default=str))

Python Output:

Input:
{
  "numbers": [
    1,
    2,
    [
      3,
      4
    ]
  ],
  "nested": {
    "deep": {
      "value": 42
    }
  },
  "tuple": "(1, (2, 3))"
}

Output:
{
  "numbers": [
    1,
    2,
    [
      3,
      4
    ]
  ],
  "nested": {
    "deep": {
      "value": 42
    }
  },
  "tuple": "(1, (2, 3))"
}

Observation: The structure is preserved, but we're not actually doing anything with the values yet. This is a structure-preserving traversalβ€”the foundation for any nested data processing.

Applying Transformations: Masking Passwords

Now let's use this framework to solve our original problem:

def mask_sensitive_data(data, sensitive_keys=None):
    """
    Recursively traverse nested structure and mask sensitive values.

    Args:
        data: The nested structure to process
        sensitive_keys: Set of keys to mask (default: {"password", "secret", "token"})

    Returns:
        New structure with sensitive values masked
    """
    if sensitive_keys is None:
        sensitive_keys = {"password", "secret", "token", "api_key"}

    if isinstance(data, dict):
        result = {}
        for key, value in data.items():
            # Check if this key is sensitive
            if key.lower() in sensitive_keys:
                result[key] = "***MASKED***"
            else:
                # Recurse on the value
                result[key] = mask_sensitive_data(value, sensitive_keys)
        return result

    elif isinstance(data, (list, tuple)):
        # Recurse on sequence items
        processed = [mask_sensitive_data(item, sensitive_keys) for item in data]
        return type(data)(processed)

    else:
        # Base case: return atomic value unchanged
        return data

# Test with our config
config = {
    "database": {
        "primary": {
            "host": "db1.example.com",
            "port": 5432,
            "credentials": {
                "username": "admin",
                "password": "secret123"
            }
        },
        "replicas": [
            {
                "host": "db2.example.com",
                "password": "replica_pass"
            }
        ]
    },
    "api_keys": {
        "stripe": "sk_live_abc123",
        "sendgrid": "SG.xyz789"
    }
}

masked = mask_sensitive_data(config)
print(json.dumps(masked, indent=2))

Python Output:

{
  "database": {
    "primary": {
      "host": "db1.example.com",
      "port": 5432,
      "credentials": {
        "username": "admin",
        "password": "***MASKED***"
      }
    },
    "replicas": [
      {
        "host": "db2.example.com",
        "password": "***MASKED***"
      }
    ]
  },
  "api_keys": {
    "stripe": "sk_live_abc123",
    "sendgrid": "SG.xyz789"
  }
}

Success! Passwords are masked at any depth, in both dicts and lists.

Execution Trace: Following the Recursion

Let's trace a simplified example to see the recursion in action:

# Simplified example for tracing
simple = {
    "user": "admin",
    "password": "secret",
    "settings": {
        "theme": "dark",
        "api_key": "abc123"
    }
}

def mask_with_trace(data, sensitive_keys=None, depth=0):
    """Version with execution tracing"""
    indent = "  " * depth

    if sensitive_keys is None:
        sensitive_keys = {"password", "api_key"}

    if isinstance(data, dict):
        print(f"{indent}Processing dict with keys: {list(data.keys())}")
        result = {}
        for key, value in data.items():
            if key.lower() in sensitive_keys:
                print(f"{indent}  Key '{key}' is sensitive β†’ masking")
                result[key] = "***MASKED***"
            else:
                print(f"{indent}  Key '{key}' β†’ recursing on value")
                result[key] = mask_with_trace(value, sensitive_keys, depth + 1)
        return result

    else:
        print(f"{indent}Base case: {repr(data)}")
        return data

print("Execution trace:")
result = mask_with_trace(simple)
print("\nResult:")
print(json.dumps(result, indent=2))

Python Output:

Execution trace:
Processing dict with keys: ['user', 'password', 'settings']
  Key 'user' β†’ recursing on value
  Base case: 'admin'
  Key 'password' is sensitive β†’ masking
  Key 'settings' β†’ recursing on value
  Processing dict with keys: ['theme', 'api_key']
    Key 'theme' β†’ recursing on value
    Base case: 'dark'
    Key 'api_key' is sensitive β†’ masking

Result:
{
  "user": "admin",
  "password": "***MASKED***",
  "settings": {
    "theme": "dark",
    "api_key": "***MASKED***"
  }
}

Key observations: 1. We recurse on dict values, not keys 2. Sensitive keys are detected at any depth 3. Base cases (strings, numbers) stop the recursion 4. The structure is preserved throughout

Current Limitation: Case Sensitivity

Notice we used key.lower() in our check. What if the config uses different casing?

# Test with mixed casing
mixed_case = {
    "Password": "secret1",
    "PASSWORD": "secret2",
    "password": "secret3",
    "passWord": "secret4"
}

result = mask_sensitive_data(mixed_case)
print(json.dumps(result, indent=2))

Python Output:

{
  "Password": "***MASKED***",
  "PASSWORD": "***MASKED***",
  "password": "***MASKED***",
  "passWord": "***MASKED***"
}

Good! Our key.lower() handles this correctly. But what about partial matches?

# Test with partial matches
partial = {
    "user_password": "secret1",
    "db_password_hash": "secret2",
    "password_reset_token": "secret3",
    "old_password": "secret4"
}

result = mask_sensitive_data(partial)
print(json.dumps(result, indent=2))

Python Output:

{
  "user_password": "secret1",
  "db_password_hash": "secret2",
  "password_reset_token": "secret3",
  "old_password": "secret4"
}

Problem: None of these were masked! Our exact match check misses compound keys.

What we need: Pattern matching that can detect sensitive substrings in keys.

Iteration 3: Pattern Matching and Flexibility

Iteration 3: Pattern Matching and Flexibility

The Enhanced Implementation

Let's add pattern matching to catch more sensitive data:

def mask_sensitive_data_advanced(data, sensitive_patterns=None):
    """
    Recursively mask sensitive data with pattern matching.

    Args:
        data: Nested structure to process
        sensitive_patterns: List of patterns to match (substrings)

    Returns:
        New structure with sensitive values masked
    """
    if sensitive_patterns is None:
        sensitive_patterns = ["password", "secret", "token", "api_key", "private_key"]

    def is_sensitive(key):
        """Check if key contains any sensitive pattern"""
        key_lower = key.lower()
        return any(pattern in key_lower for pattern in sensitive_patterns)

    if isinstance(data, dict):
        result = {}
        for key, value in data.items():
            if is_sensitive(key):
                result[key] = "***MASKED***"
            else:
                result[key] = mask_sensitive_data_advanced(value, sensitive_patterns)
        return result

    elif isinstance(data, (list, tuple)):
        processed = [mask_sensitive_data_advanced(item, sensitive_patterns) for item in data]
        return type(data)(processed)

    else:
        return data

# Test with compound keys
compound = {
    "user_password": "secret1",
    "db_password_hash": "secret2",
    "password_reset_token": "secret3",
    "old_password": "secret4",
    "username": "admin",  # Should NOT be masked
    "api_key_production": "key123"
}

result = mask_sensitive_data_advanced(compound)
print(json.dumps(result, indent=2))

Python Output:

{
  "user_password": "***MASKED***",
  "db_password_hash": "***MASKED***",
  "password_reset_token": "***MASKED***",
  "old_password": "***MASKED***",
  "username": "admin",
  "api_key_production": "***MASKED***"
}

Success! Now we catch all password-related keys, including compound names.

Adding Custom Masking Functions

Sometimes you want different masking strategies for different data types:

def mask_with_strategy(data, sensitive_patterns=None, mask_fn=None):
    """
    Recursively mask sensitive data with custom masking function.

    Args:
        data: Nested structure to process
        sensitive_patterns: List of patterns to match
        mask_fn: Function to apply to sensitive values (default: replace with "***MASKED***")

    Returns:
        New structure with sensitive values masked
    """
    if sensitive_patterns is None:
        sensitive_patterns = ["password", "secret", "token", "api_key"]

    if mask_fn is None:
        mask_fn = lambda x: "***MASKED***"

    def is_sensitive(key):
        key_lower = key.lower()
        return any(pattern in key_lower for pattern in sensitive_patterns)

    if isinstance(data, dict):
        result = {}
        for key, value in data.items():
            if is_sensitive(key):
                result[key] = mask_fn(value)
            else:
                result[key] = mask_with_strategy(value, sensitive_patterns, mask_fn)
        return result

    elif isinstance(data, (list, tuple)):
        processed = [mask_with_strategy(item, sensitive_patterns, mask_fn) for item in data]
        return type(data)(processed)

    else:
        return data

# Example: Show first 3 characters of sensitive strings
def partial_mask(value):
    if isinstance(value, str) and len(value) > 3:
        return value[:3] + "***"
    return "***"

config = {
    "database": {
        "password": "secret123456",
        "api_key": "sk_live_abcdefghijk"
    },
    "user": "admin"
}

result = mask_with_strategy(config, mask_fn=partial_mask)
print(json.dumps(result, indent=2))

Python Output:

{
  "database": {
    "password": "sec***",
    "api_key": "sk_***"
  },
  "user": "admin"
}

Flexibility achieved: Now we can customize how values are masked while maintaining the recursive structure traversal.

When to Apply This Solution

What it optimizes for: - Flexibility: Handle any nested structure depth - Extensibility: Easy to add new patterns or masking strategies - Correctness: Preserves structure while transforming values

What it sacrifices: - Performance: Creates new data structures (not in-place) - Memory: Call stack depth proportional to nesting depth

When to choose this approach: - Configuration file processing - JSON/API response sanitization - Logging sensitive data - Data export/import transformations - Any tree-like data structure traversal

When to avoid this approach: - Very deep nesting (>100 levels) - risk of stack overflow - Performance-critical hot paths - consider iterative with explicit stack - In-place modification required - need different algorithm

Complexity characteristics: - Time Complexity: O(n) where n is total number of values in structure - Space Complexity: O(d) where d is maximum nesting depth (call stack) - Recurrence Relation: T(n) = kΒ·T(n/k) + O(1) where k is branching factor

Flattening vs. Transforming: Different Goals

Flattening vs. Transforming: Different Goals

We've been building structure-preserving transformations. But sometimes you actually want to flatten nested structures into a single level. Let's explore both approaches.

True Flattening: Nested Lists to Flat List

This is the classic "flatten" operation:

def flatten_completely(nested):
    """
    Flatten any nested structure of lists/tuples into a single flat list.
    All nesting is removed.
    """
    result = []

    if isinstance(nested, (list, tuple)):
        for item in nested:
            # Recurse on each item
            flattened = flatten_completely(item)
            result.extend(flattened)
    else:
        # Base case: atomic value
        result.append(nested)

    return result

# Test with deeply nested structure
deeply_nested = [1, [2, [3, [4, [5, [6, [7]]]]]]]
print(f"Input: {deeply_nested}")
print(f"Output: {flatten_completely(deeply_nested)}")

# Test with mixed nesting
mixed = [1, [2, 3], [[4, 5], [6, [7, 8]]], 9]
print(f"\nInput: {mixed}")
print(f"Output: {flatten_completely(mixed)}")

Python Output:

Input: [1, [2, [3, [4, [5, [6, [7]]]]]]]
Output: [1, 2, 3, 4, 5, 6, 7]

Input: [1, [2, 3], [[4, 5], [6, [7, 8]]], 9]
Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Flattening Dictionaries: Key Paths

For dictionaries, "flattening" means converting nested keys into dot-notation paths:

def flatten_dict(nested_dict, parent_key='', separator='.'):
    """
    Flatten nested dictionary into single-level dict with dot-notation keys.

    Example:
        {"a": {"b": {"c": 1}}} β†’ {"a.b.c": 1}
    """
    items = []

    for key, value in nested_dict.items():
        # Build the new key path
        new_key = f"{parent_key}{separator}{key}" if parent_key else key

        if isinstance(value, dict):
            # Recurse on nested dict
            items.extend(flatten_dict(value, new_key, separator).items())
        else:
            # Base case: add the flattened key-value pair
            items.append((new_key, value))

    return dict(items)

# Test with nested config
config = {
    "database": {
        "primary": {
            "host": "db1.example.com",
            "port": 5432
        },
        "replica": {
            "host": "db2.example.com",
            "port": 5432
        }
    },
    "cache": {
        "redis": {
            "host": "cache.example.com"
        }
    }
}

flat = flatten_dict(config)
for key, value in sorted(flat.items()):
    print(f"{key}: {value}")

Python Output:

cache.redis.host: cache.example.com
database.primary.host: db1.example.com
database.primary.port: 5432
database.replica.host: db2.example.com
database.replica.port: 5432

Use case: This is extremely useful for: - Configuration file validation - Environment variable generation - Logging structured data - Database column mapping

Execution Trace: Dictionary Flattening

Let's trace the execution to understand the key-building process:

def flatten_dict_traced(nested_dict, parent_key='', separator='.', depth=0):
    """Version with execution tracing"""
    indent = "  " * depth
    print(f"{indent}flatten_dict(parent_key='{parent_key}')")

    items = []

    for key, value in nested_dict.items():
        new_key = f"{parent_key}{separator}{key}" if parent_key else key
        print(f"{indent}  Processing key '{key}' β†’ new_key='{new_key}'")

        if isinstance(value, dict):
            print(f"{indent}    Value is dict β†’ recursing")
            items.extend(flatten_dict_traced(value, new_key, separator, depth + 1).items())
        else:
            print(f"{indent}    Value is {type(value).__name__} β†’ adding ('{new_key}', {repr(value)})")
            items.append((new_key, value))

    return dict(items)

# Trace a simple example
simple = {
    "db": {
        "host": "localhost",
        "port": 5432
    },
    "debug": True
}

print("Execution trace:")
result = flatten_dict_traced(simple)
print("\nResult:")
for key, value in sorted(result.items()):
    print(f"  {key}: {value}")

Python Output:

Execution trace:
flatten_dict(parent_key='')
  Processing key 'db' β†’ new_key='db'
    Value is dict β†’ recursing
  flatten_dict(parent_key='db')
    Processing key 'host' β†’ new_key='db.host'
      Value is str β†’ adding ('db.host', 'localhost')
    Processing key 'port' β†’ new_key='db.port'
      Value is int β†’ adding ('db.port', 5432)
  Processing key 'debug' β†’ new_key='debug'
    Value is bool β†’ adding ('debug', True)

Result:
  db.host: localhost
  db.port: 5432
  debug: True

Unflattening: The Reverse Operation

Once you can flatten, you often need to unflatten:

def unflatten_dict(flat_dict, separator='.'):
    """
    Convert flattened dictionary back to nested structure.

    Example:
        {"a.b.c": 1} β†’ {"a": {"b": {"c": 1}}}
    """
    result = {}

    for flat_key, value in flat_dict.items():
        # Split the key into parts
        parts = flat_key.split(separator)

        # Navigate/create the nested structure
        current = result
        for part in parts[:-1]:  # All but the last part
            if part not in current:
                current[part] = {}
            current = current[part]

        # Set the final value
        current[parts[-1]] = value

    return result

# Test round-trip
original = {
    "database": {
        "primary": {
            "host": "db1.example.com",
            "port": 5432
        }
    }
}

print("Original:")
print(json.dumps(original, indent=2))

flat = flatten_dict(original)
print("\nFlattened:")
for key, value in sorted(flat.items()):
    print(f"  {key}: {value}")

restored = unflatten_dict(flat)
print("\nRestored:")
print(json.dumps(restored, indent=2))

print("\nRound-trip successful:", original == restored)

Python Output:

Original:
{
  "database": {
    "primary": {
      "host": "db1.example.com",
      "port": 5432
    }
  }
}

Flattened:
  database.primary.host: db1.example.com
  database.primary.port: 5432

Restored:
{
  "database": {
    "primary": {
      "host": "db1.example.com",
      "port": 5432
    }
  }
}

Round-trip successful: True

Note: The unflatten_dict function is not recursiveβ€”it uses iteration to build the nested structure. This demonstrates that not every operation on nested data requires recursion. Choose the approach that makes the logic clearest.

Real-World Application: JSON Path Queries

Real-World Application: JSON Path Queries

Let's build something practical: a JSON path query system that can extract values from nested structures using path expressions like "database.primary.host".

The Query System

def get_nested_value(data, path, separator='.', default=None):
    """
    Extract value from nested structure using dot-notation path.

    Args:
        data: Nested dict/list structure
        path: Dot-separated path (e.g., "database.primary.host")
        separator: Path separator (default: '.')
        default: Value to return if path not found

    Returns:
        Value at path, or default if not found
    """
    parts = path.split(separator)
    current = data

    for part in parts:
        if isinstance(current, dict):
            if part not in current:
                return default
            current = current[part]
        elif isinstance(current, list):
            # Handle list indices
            try:
                index = int(part)
                current = current[index]
            except (ValueError, IndexError):
                return default
        else:
            # Can't navigate further
            return default

    return current

# Test with complex structure
config = {
    "database": {
        "primary": {
            "host": "db1.example.com",
            "port": 5432
        },
        "replicas": [
            {"host": "db2.example.com", "port": 5432},
            {"host": "db3.example.com", "port": 5432}
        ]
    },
    "cache": {
        "redis": {
            "nodes": [
                {"host": "cache1.example.com"},
                {"host": "cache2.example.com"}
            ]
        }
    }
}

# Test various paths
paths = [
    "database.primary.host",
    "database.primary.port",
    "database.replicas.0.host",
    "database.replicas.1.host",
    "cache.redis.nodes.0.host",
    "database.nonexistent",
    "database.replicas.99.host"
]

for path in paths:
    value = get_nested_value(config, path, default="NOT FOUND")
    print(f"{path:35} β†’ {value}")

Python Output:

database.primary.host               β†’ db1.example.com
database.primary.port               β†’ 5432
database.replicas.0.host            β†’ db2.example.com
database.replicas.1.host            β†’ db3.example.com
cache.redis.nodes.0.host            β†’ cache1.example.com
database.nonexistent                β†’ NOT FOUND
database.replicas.99.host           β†’ NOT FOUND

Setting Values: The Recursive Approach

Now let's build the setter, which does require recursion:

def set_nested_value(data, path, value, separator='.', create_missing=True):
    """
    Set value in nested structure using dot-notation path.
    Creates intermediate dicts if they don't exist.

    Args:
        data: Nested dict structure (modified in place)
        path: Dot-separated path
        value: Value to set
        separator: Path separator
        create_missing: Whether to create missing intermediate dicts

    Returns:
        True if successful, False if path invalid
    """
    parts = path.split(separator)

    # Navigate to parent of target
    current = data
    for part in parts[:-1]:
        if part not in current:
            if create_missing:
                current[part] = {}
            else:
                return False
        current = current[part]

        if not isinstance(current, dict):
            return False

    # Set the final value
    current[parts[-1]] = value
    return True

# Test setting values
config = {}

# Set some nested values
set_nested_value(config, "database.primary.host", "db1.example.com")
set_nested_value(config, "database.primary.port", 5432)
set_nested_value(config, "cache.redis.host", "cache.example.com")

print(json.dumps(config, indent=2))

Python Output:

{
  "database": {
    "primary": {
      "host": "db1.example.com",
      "port": 5432
    }
  },
  "cache": {
    "redis": {
      "host": "cache.example.com"
    }
  }
}

Recursive Path Finding: All Paths to a Value

Let's build a function that finds all paths to a specific value:

def find_all_paths(data, target_value, current_path=''):
    """
    Find all paths in nested structure that lead to target value.

    Args:
        data: Nested structure to search
        target_value: Value to find
        current_path: Current path (used in recursion)

    Returns:
        List of paths (as strings) where value was found
    """
    paths = []

    if isinstance(data, dict):
        for key, value in data.items():
            new_path = f"{current_path}.{key}" if current_path else key

            if value == target_value:
                paths.append(new_path)

            # Recurse on the value
            paths.extend(find_all_paths(value, target_value, new_path))

    elif isinstance(data, list):
        for index, item in enumerate(data):
            new_path = f"{current_path}[{index}]"

            if item == target_value:
                paths.append(new_path)

            # Recurse on the item
            paths.extend(find_all_paths(item, target_value, new_path))

    return paths

# Test with structure containing duplicate values
config = {
    "database": {
        "primary": {
            "host": "db.example.com",
            "port": 5432
        },
        "replica": {
            "host": "db.example.com",  # Same host!
            "port": 5432
        }
    },
    "cache": {
        "port": 5432  # Same port!
    }
}

# Find all occurrences of port 5432
paths = find_all_paths(config, 5432)
print("All paths to value 5432:")
for path in paths:
    print(f"  {path}")

# Find all occurrences of the host
paths = find_all_paths(config, "db.example.com")
print("\nAll paths to value 'db.example.com':")
for path in paths:
    print(f"  {path}")

Python Output:

All paths to value 5432:
  database.primary.port
  database.replica.port
  cache.port

All paths to value 'db.example.com':
  database.primary.host
  database.replica.host

Execution Trace: Path Finding

Let's trace the path-finding recursion:

def find_all_paths_traced(data, target_value, current_path='', depth=0):
    """Version with execution tracing"""
    indent = "  " * depth
    print(f"{indent}find_all_paths(path='{current_path}', target={repr(target_value)})")

    paths = []

    if isinstance(data, dict):
        print(f"{indent}  Type: dict with keys {list(data.keys())}")
        for key, value in data.items():
            new_path = f"{current_path}.{key}" if current_path else key
            print(f"{indent}  Checking key '{key}' β†’ path='{new_path}'")

            if value == target_value:
                print(f"{indent}    βœ“ MATCH! Adding path '{new_path}'")
                paths.append(new_path)

            paths.extend(find_all_paths_traced(value, target_value, new_path, depth + 1))

    elif isinstance(data, list):
        print(f"{indent}  Type: list with {len(data)} items")
        for index, item in enumerate(data):
            new_path = f"{current_path}[{index}]"
            print(f"{indent}  Checking index {index} β†’ path='{new_path}'")

            if item == target_value:
                print(f"{indent}    βœ“ MATCH! Adding path '{new_path}'")
                paths.append(new_path)

            paths.extend(find_all_paths_traced(item, target_value, new_path, depth + 1))

    else:
        print(f"{indent}  Type: {type(data).__name__} = {repr(data)}")

    return paths

# Trace a simple example
simple = {
    "a": 1,
    "b": {
        "c": 1,
        "d": 2
    }
}

print("Execution trace:")
paths = find_all_paths_traced(simple, 1)
print(f"\nFound paths: {paths}")

Python Output:

Execution trace:
find_all_paths(path='', target=1)
  Type: dict with keys ['a', 'b']
  Checking key 'a' β†’ path='a'
    βœ“ MATCH! Adding path 'a'
  Type: int = 1
  Checking key 'b' β†’ path='b'
  find_all_paths(path='b', target=1)
    Type: dict with keys ['c', 'd']
    Checking key 'c' β†’ path='b.c'
      βœ“ MATCH! Adding path 'b.c'
    Type: int = 1
    Checking key 'd' β†’ path='b.d'
    Type: int = 2

Found paths: ['a', 'b.c']

Key insight: The recursion naturally handles arbitrary nesting depth. We don't need to know how deep the structure isβ€”the recursion explores every branch until it hits base cases (non-dict, non-list values).

Common Failure Modes and Their Signatures

Common Failure Modes and Their Signatures

Symptom: Infinite Recursion on Strings

Python output pattern:

RecursionError: maximum recursion depth exceeded while calling a Python object

Diagnostic clues: - Traceback shows same function called repeatedly with single-character strings - Call stack depth reaches ~1000 (Python's default limit) - Last few calls show: flatten("h") β†’ flatten("h") β†’ flatten("h")

Root cause: Strings are sequences in Python, so isinstance(item, (list, tuple, str)) matches strings. Since "h"[0] returns "h", recursion never terminates.

Solution: Explicitly exclude strings from sequence types: isinstance(item, (list, tuple)) only.


Symptom: Dictionary Keys Flattened Instead of Values

Python output pattern:

['database', 'host', 'credentials', 'password']

Diagnostic clues: - Output is a list of strings (the keys) - Original nested structure is lost - Values are missing entirely

Root cause: Dictionaries are iterable over their keys by default. When treated as a sequence, for item in dict iterates over keys, not key-value pairs.

Solution: Use isinstance(data, dict) check and iterate with data.items() to access both keys and values.


Symptom: Type Mismatch After Processing

Python output pattern:

TypeError: 'tuple' object does not support item assignment

Diagnostic clues: - Error occurs when trying to modify result - Input was a tuple, output is a list - Type preservation was expected but not maintained

Root cause: Processing converts all sequences to lists: [process(item) for item in data] always returns a list.

Solution: Preserve input type: type(data)([process(item) for item in data]) returns same type as input.


Symptom: Missing Nested Values

Python output pattern:

{
  "database": {
    "password": "secret123"
  },
  "api_keys": {
    "stripe": "sk_live_abc123"
  }
}

Diagnostic clues: - Some sensitive keys are masked, others are not - Unmasked keys contain sensitive patterns but don't match exactly - Keys like "user_password" or "api_key_production" are missed

Root cause: Exact string matching (key == "password") misses compound keys containing the sensitive pattern.

Solution: Use substring matching: any(pattern in key.lower() for pattern in sensitive_patterns).


Symptom: Stack Overflow on Deep Nesting

Python output pattern:

RecursionError: maximum recursion depth exceeded in comparison

Diagnostic clues: - Occurs with very deeply nested structures (>1000 levels) - Traceback shows legitimate recursive calls, not infinite loop - Structure is valid but exceeds Python's recursion limit

Root cause: Python's default recursion limit (1000) is exceeded by legitimate deep nesting.

Solution: Either increase limit with sys.setrecursionlimit() (risky) or convert to iterative approach with explicit stack.


Symptom: Circular Reference Infinite Loop

Python output pattern:

RecursionError: maximum recursion depth exceeded

Diagnostic clues: - Structure contains circular references (e.g., a["self"] = a) - Traceback shows same object being processed repeatedly - Memory address in repr() shows same object: <dict at 0x7f8b8c>

Root cause: Recursive traversal doesn't track visited objects, so circular references cause infinite loops.

Solution: Maintain a visited set of object IDs:

def process_with_cycle_detection(data, visited=None):
    """Process nested structure with circular reference detection"""
    if visited is None:
        visited = set()

    # Check if we've seen this object before
    obj_id = id(data)
    if obj_id in visited:
        return "[CIRCULAR REFERENCE]"

    # Mark as visited
    visited.add(obj_id)

    try:
        if isinstance(data, dict):
            result = {}
            for key, value in data.items():
                result[key] = process_with_cycle_detection(value, visited)
            return result
        elif isinstance(data, (list, tuple)):
            processed = [process_with_cycle_detection(item, visited) for item in data]
            return type(data)(processed)
        else:
            return data
    finally:
        # Remove from visited when backtracking (allows same object in different branches)
        visited.remove(obj_id)

# Test with circular reference
circular = {"a": 1}
circular["self"] = circular  # Circular reference!

result = process_with_cycle_detection(circular)
print(json.dumps(result, indent=2))

Python Output:

{
  "a": 1,
  "self": "[CIRCULAR REFERENCE]"
}

Debugging Workflow: When Your Nested Traversal Fails

Debugging Workflow: When Your Nested Traversal Fails

Step 1: Identify the Failure Mode

What to look for in Python output:

  1. RecursionError: Infinite recursion (check for string handling or circular references)
  2. TypeError: Type mismatch (check type preservation or dict vs list handling)
  3. KeyError/IndexError: Path navigation failed (check bounds and existence)
  4. Wrong output: Logic error (trace execution with print statements)

Step 2: Add Strategic Print Statements

Where to place prints to visualize recursion:

def debug_nested_traversal(data, depth=0):
    """Template for debugging nested traversal"""
    indent = "  " * depth

    # Print entry point
    print(f"{indent}β†’ Entering: type={type(data).__name__}, depth={depth}")

    if isinstance(data, dict):
        print(f"{indent}  Dict with keys: {list(data.keys())}")
        result = {}
        for key, value in data.items():
            print(f"{indent}  Processing key: {repr(key)}")
            result[key] = debug_nested_traversal(value, depth + 1)
        return result

    elif isinstance(data, (list, tuple)):
        print(f"{indent}  {type(data).__name__} with {len(data)} items")
        processed = [debug_nested_traversal(item, depth + 1) for item in data]
        return type(data)(processed)

    else:
        print(f"{indent}  Base case: {repr(data)}")
        return data

Step 3: Manually Trace with Small Input

How to execute the function by hand:

  1. Start with the smallest possible input that fails
  2. Write down each recursive call
  3. Track the call stack depth
  4. Note when base cases are reached
  5. Verify return values propagate correctly

Example manual trace:

Input: {"a": {"b": 1}}

Call 1: process({"a": {"b": 1}})
  β†’ isinstance(data, dict) = True
  β†’ Iterate: key="a", value={"b": 1}
  β†’ Recurse: process({"b": 1})

  Call 2: process({"b": 1})
    β†’ isinstance(data, dict) = True
    β†’ Iterate: key="b", value=1
    β†’ Recurse: process(1)

    Call 3: process(1)
      β†’ isinstance(data, dict) = False
      β†’ isinstance(data, (list, tuple)) = False
      β†’ Base case: return 1

    ← Return {"b": 1}

  ← Return {"a": {"b": 1}}

Step 4: Verify Base Cases

Checklist for base case validation:

Step 5: Test Edge Cases Systematically

Essential test cases for nested structure processing:

def test_nested_processing(process_fn):
    """Comprehensive test suite for nested structure processing"""

    test_cases = [
        # Empty structures
        ({}, "empty dict"),
        ([], "empty list"),

        # Single level
        ({"a": 1}, "single key-value"),
        ([1, 2, 3], "flat list"),

        # Nested structures
        ({"a": {"b": 1}}, "nested dict"),
        ([1, [2, 3]], "nested list"),

        # Mixed types
        ({"a": [1, 2], "b": {"c": 3}}, "mixed dict-list"),
        ([1, {"a": 2}, [3, 4]], "mixed list-dict"),

        # Deep nesting
        ({"a": {"b": {"c": {"d": 1}}}}, "deep dict"),
        ([[[[[1]]]]], "deep list"),

        # Strings (should not be recursed)
        ({"a": "hello"}, "string value"),
        (["hello", "world"], "string list"),

        # Special values
        ({"a": None, "b": True, "c": 3.14}, "special values"),

        # Tuples
        ((1, (2, 3)), "nested tuple"),
    ]

    print("Running test suite...")
    for test_input, description in test_cases:
        try:
            result = process_fn(test_input)
            print(f"βœ“ {description:25} β†’ {type(result).__name__}")
        except Exception as e:
            print(f"βœ— {description:25} β†’ {type(e).__name__}: {e}")

# Test our implementation
test_nested_processing(mask_sensitive_data_advanced)

Python Output:

Running test suite...
βœ“ empty dict                  β†’ dict
βœ“ empty list                  β†’ list
βœ“ single key-value            β†’ dict
βœ“ flat list                   β†’ list
βœ“ nested dict                 β†’ dict
βœ“ nested list                 β†’ list
βœ“ mixed dict-list             β†’ dict
βœ“ mixed list-dict             β†’ list
βœ“ deep dict                   β†’ dict
βœ“ deep list                   β†’ list
βœ“ string value                β†’ dict
βœ“ string list                 β†’ list
βœ“ special values              β†’ dict
βœ“ nested tuple                β†’ tuple

Step 6: Apply the Fix

Decision tree for choosing the right technique:

Is the recursion infinite?
β”œβ”€ Yes β†’ Check for:
β”‚  β”œβ”€ String handling (exclude strings from sequence types)
β”‚  β”œβ”€ Circular references (add visited tracking)
β”‚  └─ Missing base case (add explicit checks)
β”‚
└─ No β†’ Is the output wrong?
   β”œβ”€ Structure lost β†’ Check dict vs list handling
   β”œβ”€ Type changed β†’ Add type preservation
   β”œβ”€ Values missing β†’ Check pattern matching
   └─ Wrong values β†’ Trace execution with prints

The Journey: From Problem to Solution

The Journey: From Problem to Solution

The Complete Evolution

Iteration Failure Mode Technique Applied Result Complexity Impact
0 Only handles lists Basic list recursion Works for simple lists O(n) time, O(d) space
1 Tuples not flattened Multi-type checking Handles list and tuple O(n) time, O(d) space
1b Infinite recursion on strings String exclusion Strings treated as atomic O(n) time, O(d) space
2 Dicts flattened incorrectly Dict-specific handling Preserves dict structure O(n) time, O(d) space
3 Exact key matching too narrow Pattern matching Catches compound keys O(nΒ·m) time, O(d) space
3b Inflexible masking Custom masking functions Flexible transformations O(n) time, O(d) space

where n = total values, d = max depth, m = number of patterns

Final Implementation: Production-Ready Nested Data Processor

def process_nested_data(
    data,
    transform_fn=None,
    key_filter=None,
    visited=None,
    preserve_types=True
):
    """
    Production-ready nested data processor with full feature set.

    Args:
        data: Nested structure to process (dict, list, tuple, or atomic value)
        transform_fn: Function to apply to atomic values (default: identity)
        key_filter: Function to determine if dict key should be transformed
                   (receives key, returns bool)
        visited: Set of visited object IDs (for circular reference detection)
        preserve_types: Whether to preserve tuple vs list distinction

    Returns:
        Processed structure with same shape as input

    Features:
        - Handles arbitrary nesting depth
        - Detects circular references
        - Preserves or normalizes types
        - Flexible transformation logic
        - Comprehensive error handling
    """
    # Initialize defaults
    if transform_fn is None:
        transform_fn = lambda x: x

    if key_filter is None:
        key_filter = lambda k: False

    if visited is None:
        visited = set()

    # Circular reference detection
    obj_id = id(data)
    if obj_id in visited:
        return "[CIRCULAR REFERENCE]"

    # Process based on type
    try:
        if isinstance(data, dict):
            visited.add(obj_id)
            result = {}
            for key, value in data.items():
                if key_filter(key):
                    # Transform the value directly
                    result[key] = transform_fn(value)
                else:
                    # Recurse on the value
                    result[key] = process_nested_data(
                        value, transform_fn, key_filter, visited, preserve_types
                    )
            return result

        elif isinstance(data, (list, tuple)):
            visited.add(obj_id)
            processed = [
                process_nested_data(item, transform_fn, key_filter, visited, preserve_types)
                for item in data
            ]
            # Preserve type if requested
            if preserve_types:
                return type(data)(processed)
            else:
                return processed

        else:
            # Base case: atomic value
            return data

    finally:
        # Clean up visited tracking (allows same object in different branches)
        if obj_id in visited:
            visited.remove(obj_id)

# Example 1: Mask sensitive data
def mask_value(value):
    return "***MASKED***"

def is_sensitive_key(key):
    sensitive_patterns = ["password", "secret", "token", "api_key"]
    return any(pattern in key.lower() for pattern in sensitive_patterns)

config = {
    "database": {
        "host": "db.example.com",
        "password": "secret123",
        "replicas": [
            {"host": "db2.example.com", "password": "replica_pass"}
        ]
    }
}

masked = process_nested_data(config, mask_value, is_sensitive_key)
print("Example 1: Masked config")
print(json.dumps(masked, indent=2))

# Example 2: Convert all strings to uppercase
def uppercase(value):
    return value.upper() if isinstance(value, str) else value

data = {
    "name": "alice",
    "settings": {
        "theme": "dark",
        "language": "en"
    }
}

uppercased = process_nested_data(data, uppercase)
print("\nExample 2: Uppercased strings")
print(json.dumps(uppercased, indent=2))

# Example 3: Collect all numeric values
def collect_numbers(data):
    """Specialized function to collect all numbers from nested structure"""
    numbers = []

    def collect(value):
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            numbers.append(value)
        return value

    process_nested_data(data, collect)
    return numbers

data = {
    "a": 1,
    "b": {
        "c": 2,
        "d": [3, 4, {"e": 5}]
    }
}

numbers = collect_numbers(data)
print(f"\nExample 3: Collected numbers: {numbers}")

Python Output:

Example 1: Masked config
{
  "database": {
    "host": "db.example.com",
    "password": "***MASKED***",
    "replicas": [
      {
        "host": "db2.example.com",
        "password": "***MASKED***"
      }
    ]
  }
}

Example 2: Uppercased strings
{
  "name": "ALICE",
  "settings": {
    "theme": "DARK",
    "language": "EN"
  }
}

Example 3: Collected numbers: [1, 2, 3, 4, 5]

Decision Framework: Which Approach When?

Goal Approach When to Use
Flatten to single level flatten_completely() Need simple list of all values, structure not important
Flatten dict keys flatten_dict() Configuration validation, environment variables, logging
Transform values process_nested_data() Masking, validation, type conversion, data sanitization
Find paths find_all_paths() Debugging, searching, duplicate detection
Query by path get_nested_value() Configuration access, API response parsing

Complexity Analysis

Time Complexity: O(n) - Each value in the structure is visited exactly once - Dict iteration: O(k) where k is number of keys - List iteration: O(m) where m is number of items - Total: O(n) where n is total number of values

Space Complexity: O(d) - Call stack depth: d (maximum nesting depth) - Each recursive call stores constant data (current dict/list reference) - Visited set: O(v) where v is number of container objects - Total: O(d + v), typically O(d) dominates

Recurrence Relation:

T(n) = kΒ·T(n/k) + O(1)

where k is the branching factor (number of children per container)

For balanced structures, this solves to O(n) by the Master Theorem (case 2).

Lessons Learned

1. Type-based recursion is fundamentally different - List recursion: first + rest pattern - Tree recursion: node + children pattern - Nested structure recursion: type determines recursion strategy

2. Strings are the edge case that breaks everything - Always explicitly exclude strings from sequence handling - isinstance(item, (list, tuple)) not isinstance(item, collections.abc.Sequence)

3. Circular references require explicit tracking - Use id() to track object identity - Clean up visited set during backtracking - Provide clear indication when cycles are detected

4. Flexibility comes from separation of concerns - Traversal logic (how to recurse) - Transformation logic (what to do with values) - Filtering logic (which keys to transform) - Keep these separate for maximum reusability

5. The right abstraction makes everything easier - Generic process_nested_data() handles all cases - Specific functions (mask_sensitive_data(), flatten_dict()) are thin wrappers - Build the general tool first, then specialize

Project: Build a Nested Data Explorer

Project: Build a Nested Data Explorer

Project Overview

Build a command-line tool that explores and analyzes nested data structures (JSON, Python dicts). Your tool should support:

  1. Interactive exploration: Navigate through nested structures
  2. Search functionality: Find values, keys, or patterns
  3. Statistics: Analyze structure depth, size, types
  4. Transformation: Apply operations to nested data
  5. Visualization: Display structure in readable format

Project Requirements

Core Features (Required)

1. Load and Display - Load JSON from file or Python dict from code - Display structure with proper indentation - Show types and values clearly

2. Navigation - Navigate to specific paths (e.g., database.primary.host) - List all keys at current level - Move up/down the hierarchy

3. Search - Find all paths containing a specific value - Find all keys matching a pattern - Find all values of a specific type

4. Statistics - Calculate maximum nesting depth - Count total number of values - Count values by type - Identify largest nested structures

5. Transformation - Mask sensitive data - Convert all strings to uppercase/lowercase - Remove keys matching pattern - Flatten structure

Advanced Features (Optional)

6. Diff Tool - Compare two nested structures - Show what changed (added, removed, modified) - Highlight differences

7. Query Language - Support JSONPath-like queries - Filter by conditions - Extract matching subtrees

8. Validation - Validate structure against schema - Check for required keys - Verify value types

Starter Code

Here's a framework to get you started:

import json
from typing import Any, List, Dict, Callable

class NestedDataExplorer:
    """
    Interactive explorer for nested data structures.

    Your task: Implement the methods marked with TODO.
    """

    def __init__(self, data: Any):
        """Initialize explorer with data structure"""
        self.data = data
        self.current_path = []

    def display(self, data: Any = None, indent: int = 0) -> None:
        """
        Display nested structure with proper formatting.

        TODO: Implement this method
        - Show dicts with their keys
        - Show lists with indices
        - Show atomic values with their types
        - Use indentation to show nesting
        """
        pass

    def navigate_to(self, path: str) -> Any:
        """
        Navigate to specific path in structure.

        TODO: Implement this method
        - Parse dot-notation path (e.g., "database.primary.host")
        - Handle list indices (e.g., "replicas.0.host")
        - Return value at path or None if not found
        """
        pass

    def find_value(self, target: Any) -> List[str]:
        """
        Find all paths containing target value.

        TODO: Implement this method
        - Search entire structure recursively
        - Return list of paths where value was found
        - Handle all data types correctly
        """
        pass

    def find_keys(self, pattern: str) -> List[str]:
        """
        Find all keys matching pattern (substring match).

        TODO: Implement this method
        - Search all dict keys recursively
        - Return paths to matching keys
        - Case-insensitive matching
        """
        pass

    def calculate_depth(self, data: Any = None) -> int:
        """
        Calculate maximum nesting depth.

        TODO: Implement this method
        - Return 0 for atomic values
        - Return 1 + max(child depths) for containers
        - Handle both dicts and lists
        """
        pass

    def count_values(self, data: Any = None) -> int:
        """
        Count total number of values in structure.

        TODO: Implement this method
        - Count all atomic values
        - Don't count containers themselves
        - Handle nested structures correctly
        """
        pass

    def type_statistics(self, data: Any = None) -> Dict[str, int]:
        """
        Count values by type.

        TODO: Implement this method
        - Return dict mapping type names to counts
        - Example: {"str": 5, "int": 3, "bool": 1}
        - Handle nested structures recursively
        """
        pass

    def mask_sensitive(self, patterns: List[str]) -> Any:
        """
        Mask values for keys matching patterns.

        TODO: Implement this method
        - Find keys containing any pattern
        - Replace their values with "***MASKED***"
        - Return new structure (don't modify original)
        """
        pass

    def flatten_to_paths(self, data: Any = None) -> Dict[str, Any]:
        """
        Flatten structure to path: value mapping.

        TODO: Implement this method
        - Convert nested structure to flat dict
        - Keys are dot-notation paths
        - Example: {"database.host": "localhost"}
        """
        pass

# Example usage and test cases
if __name__ == "__main__":
    # Sample data
    config = {
        "database": {
            "primary": {
                "host": "db1.example.com",
                "port": 5432,
                "password": "secret123"
            },
            "replicas": [
                {"host": "db2.example.com", "port": 5432},
                {"host": "db3.example.com", "port": 5432}
            ]
        },
        "cache": {
            "redis": {
                "host": "cache.example.com",
                "port": 6379
            }
        },
        "debug": True
    }

    # Create explorer
    explorer = NestedDataExplorer(config)

    # Test your implementation
    print("=== Display Structure ===")
    explorer.display()

    print("\n=== Navigate to Path ===")
    value = explorer.navigate_to("database.primary.host")
    print(f"database.primary.host = {value}")

    print("\n=== Find Value ===")
    paths = explorer.find_value(5432)
    print(f"Paths containing 5432: {paths}")

    print("\n=== Find Keys ===")
    paths = explorer.find_keys("host")
    print(f"Paths with 'host' in key: {paths}")

    print("\n=== Statistics ===")
    print(f"Max depth: {explorer.calculate_depth()}")
    print(f"Total values: {explorer.count_values()}")
    print(f"Type counts: {explorer.type_statistics()}")

    print("\n=== Mask Sensitive ===")
    masked = explorer.mask_sensitive(["password", "secret"])
    print(json.dumps(masked, indent=2))

    print("\n=== Flatten ===")
    flat = explorer.flatten_to_paths()
    for path, value in sorted(flat.items()):
        print(f"{path}: {value}")

Implementation Guide

Phase 1: Core Traversal (Start Here)

  1. Implement display() first
  2. This helps you understand the structure
  3. Use recursion with depth tracking
  4. Format output clearly

  5. Implement navigate_to()

  6. Parse path string into parts
  7. Navigate step by step
  8. Handle missing keys gracefully

  9. Implement calculate_depth()

  10. Simple recursive function
  11. Base case: atomic value β†’ 0
  12. Recursive case: 1 + max(child depths)

Phase 2: Search and Analysis

  1. Implement find_value()
  2. Reuse pattern from earlier in chapter
  3. Build path as you recurse
  4. Collect all matches

  5. Implement find_keys()

  6. Similar to find_value() but check keys
  7. Use substring matching
  8. Case-insensitive comparison

  9. Implement count_values() and type_statistics()

  10. Traverse entire structure
  11. Accumulate counts
  12. Distinguish containers from values

Phase 3: Transformations

  1. Implement mask_sensitive()
  2. Reuse pattern from Iteration 3
  3. Check keys against patterns
  4. Return new structure

  5. Implement flatten_to_paths()

  6. Reuse pattern from section 16.6
  7. Build path strings as you recurse
  8. Return flat dictionary

Testing Strategy

Create test cases for:

  1. Empty structures: {}, []
  2. Single level: {"a": 1}
  3. Deep nesting: {"a": {"b": {"c": 1}}}
  4. Mixed types: Dicts, lists, strings, numbers, booleans
  5. Edge cases: None values, empty strings, zero
  6. Large structures: Load real JSON files

Extension Ideas

Once you have the core working:

  1. Add interactive mode: REPL-style interface
  2. Support file I/O: Load/save JSON files
  3. Add diff functionality: Compare two structures
  4. Implement query language: JSONPath-like syntax
  5. Add visualization: Tree-like display with ASCII art
  6. Performance optimization: Cache results, lazy evaluation

Example Output

Your completed tool should produce output like:

=== Display Structure ===
{dict}
  database: {dict}
    primary: {dict}
      host: "db1.example.com" {str}
      port: 5432 {int}
      password: "secret123" {str}
    replicas: {list}
      [0]: {dict}
        host: "db2.example.com" {str}
        port: 5432 {int}
      [1]: {dict}
        host: "db3.example.com" {str}
        port: 5432 {int}
  cache: {dict}
    redis: {dict}
      host: "cache.example.com" {str}
      port: 6379 {int}
  debug: True {bool}

=== Statistics ===
Max depth: 4
Total values: 11
Type counts: {'str': 5, 'int': 4, 'bool': 1}

Submission Requirements

Your submission should include:

  1. Complete implementation of all required methods
  2. Test suite demonstrating all features
  3. Documentation explaining your design choices
  4. Example usage with real-world data (load a JSON file)
  5. Reflection on challenges faced and lessons learned

Evaluation Criteria

Good luck! This project brings together everything you've learned about processing nested structures recursively.